Unknown Malicious Code Detection – Practical Issues

نویسندگان

  • Robert Moskovitch
  • Yuval Elovici
چکیده

The recent growth in Internet usage has motivated the creation of new malicious code for various purposes, including information warfare. Today’s signature-based anti-viruses can detect accurately known malicious code but are very limited in detecting new malicious code. New malicious codes are being created every day, and their number is expected to increase in the coming years. Recently, machine learning methods, such as classification algorithms, were used successfully for the detection of unknown malicious code. These studies were based on a test collection with a limited size of less than 3,000 files, and the proportions of malicious and benign files in both the training and test sets were identical. These test collections do not correspond to real life conditions, in which the percentage of malicious files is significantly lower than that of the benign files. In this study we present a methodology for the detection of unknown malicious code. The executable binary code is represented by n-grams. We performed an extensive evaluation using a test collection of more than 30,000 files, in which we investigated the imbalance problem. Five levels of Malicious Files Percentage (MFP) in the training set (16.7, 33.4, 50, 66.7 and 83.4%) were used to train classifiers. 17 levels of MFP (5, 7.5, 10, 12.5, 15, 20, 30, 40, 50, 60, 70, 80, 85, 87.5, 90, 92.5 and 95%) were set in the test set to represent various benign/malicious files ratio during the detection. Our evaluation results suggest that varying classification algorithms react differently to the various benign/malicious files ratio. For 10% MFP in the test set, representing real life conditions, in general the highest performance achieved for the use of less than 33.3% MFP in the training set, and in specific classifiers was above 95% of accuracy was achieved. Additionally we present a chronological evaluation, in which the dataset from 2000 to 2007 was divided to training sets and tests sets. Evaluation results show that an update in the training set is needed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unknown Malcode Detection Using OPCODE Representation

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. Today’s signature-based anti-viruses are very accurate, but cannot detect new malicious code. Recently, classification algorithms were employed successfully for the detection of unknown malicious code. However, most of the studies use byte sequence n-grams represent...

متن کامل

Unknown malcode detection - A chronological evaluation

Signature-based anti-viruses are very accurate, but are limited in detecting new malicious code. Dozens of new malicious codes are created every day, and the rate is expected to increase in coming years. To extend the generalization to detect unknown malicious code, heuristic methods are used; however, these are not successful enough. Recently, classification algorithms were used successfully f...

متن کامل

Malicious Code Detection Using Active Learning

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic and other malicious purposes. Currently, dozens of new malicious codes are created every day and this number is expected to increase in the coming years. Today’s signature-based anti-viruses and heuristic-based methods are accurate, but cannot detect new malicious code. R...

متن کامل

A Chronological Evaluation of Unknown Malcode Detection

Signature-based anti-viruses are very accurate, but are limited in detecting new malicious code. Dozens of new malicious codes are created every day, and the rate is expected to increase in coming years. To extend the generalization to detect unknown malicious code, heuristic methods are used; however, these are not successful enough. Recently, classification algorithms were used successfully f...

متن کامل

Malicious Code Outbreak Discovery: Issues and Approaches

In the exploration of solutions to recognize and mitigate self-propagating malicious code we may consider a variety of approaches: honeypots, advances in anti-virus technology, inline network application guards, or tamper-resistant execution environments such as sandboxing and constraint monitoring, to name a few. Solutions to this problem may ultimately encompass multiple approaches. While we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008